CASE-QA: Context and Syntax embeddings for Question Answering On Stack Overflow
نویسندگان
چکیده
Question answering (QA) systems rely on both knowledge bases and unstructured text corpora. Domain-specific QA presents a unique challenge, since relevant knowledge bases are often lacking and unstructured text is difficult to query and parse. This project focuses on the QUASAR-S dataset (Dhingra et al., 2017) constructed from the community QA site Stack Overflow. QUASAR-S consists of Cloze-style questions about software entities and a large background corpus of communitygenerated posts, each tagged with relevant software entities. We incorporate the tag entities as context for the QA task and find that modeling co-occurrence of tags and answers in posts leads to significant accuracy gains. To this end, we propose CASE, a hybrid of an RNN language model and a tag-answer co-occurrence model which achieves state-ofthe-art accuracy on the QUASAR-S dataset. We also find that this approach — modeling both question sentences and context-answer co-occurrence — is effective for other QA tasks. Using only language and co-occurrence modeling on the training set, CASE is competitive with the state-of-the-art method on the SPADES dataset (Bisk et al., 2016) which uses a knowledge base.
منابع مشابه
Assessing the Performance of Question-and-Answer Communities Using Survival Analysis
Question-&-Answer (QA) websites have emerged as efficient platforms for knowledge sharing and problem solving. In particular, the Stack Exchange platform includes some of the most popular QA communities to date, such as Stack Overflow. Initial metrics used to assess the performance of these communities include summative statistics like the percentage of resolved questions or the average time to...
متن کاملCreating Causal Embeddings for Question Answering with Minimal Supervision
A common model for question answering (QA) is that a good answer is one that is closely related to the question, where relatedness is often determined using generalpurpose lexical models such as word embeddings. We argue that a better approach is to look for answers that are related to the question in a relevant way, according to the information need of the question, which may be determined thr...
متن کاملAnswering Live Questions from Heterogeneous Data Sources SMART in Live QA at TREC 2016
A significant portion of information is today available in a digital format. However, users still face difficulties in accessing it. A big portion of the challenge consists in designing efficient approaches for reasoning over heterogeneous data sources. In this paper, we describe the participation of the Semantic Search and Question Answering group (SMART) in Live QA track at TREC 2016. SMART s...
متن کاملDetecting Duplicate Posts in Programming QA Communities via Latent Semantics and Association Rules
Programming community-based question-answering (PCQA) websites such as Stack Overflow enable programmers to find working solutions to their questions. Despite detailed posting guidelines, duplicate questions that have been answered are frequently created. To tackle this problem, Stack Overflow provides a mechanism for reputable users to manually mark duplicate questions. This is a laborious eff...
متن کاملInvestigating Embedded Question Reuse in Question Answering
The investigation presented in this paper is a novel method in question answering (QA) that enables a QA system to gain performance through reuse of information in the answer to one question to answer another related question. Our analysis shows that a pair of question in a general open domain QA can have embedding relation through their mentions of noun phrase expressions. We present methods f...
متن کامل